π Before diving into this, make sure you've checked out my previous blogs – especially the one on how math is used in Machine Learning and the basics of ML. This is Part 3 of our full ML learning series, and it’s all going to start connecting now!
Let’s move into the next big topic in ML – Data Processing. This is one of the most important steps in building any ML model. If your data is not clean, your model won't be accurate
The first step in data processing is cleaning the data. Think of it like cleaning your room before inviting a guest — you remove the noise, fix the errors, and make it usable.Imagine you’re building a model to predict student results, and some rows have missing marks. You either remove those rows or fill them with an average — that’s data cleaning.
Feature Engineering means creating and modifying the features (columns) in your data to improve the model’s performance. We do things like:
Encoding categorical data : Use Label Encoding to convert names like "Male" and "Female" into 0 and 1. Use One-Hot Encoding to turn "Red", "Blue", "Green" into three separate binary columns. Normalization It rescales numerical data between 0 and 1 and Standardization It transforms data using the mean and standard deviation
Outliers are data points that are very different from the rest — they can badly affect your model. One popular method to detect them is the IQR Method (Interquartile Range). In student marks, if most students score between 40–80 but one student has 5 marks or 100 marks, they may are outliers
When your dataset has too many features (like 1000+), it becomes heavy and hard to train. We use dimensionality reduction techniques to reduce the number of features while keeping the meaning. The most used method is PCA (Principal Component Analysis). It reduces, for example, 1000 features to 100, but the meaning of data remains the same. Remember our previous blog about Linear Algebra? Yep — PCA uses eigenvectors and matrices from linear algebra! Now you can connect both blogs. That’s why I explained math first — it all makes sense now, right?
Now we move to the next concept: Supervised Learning. I don’t want to bore you again by repeating “what is supervised learning?” or “what is regression and classification?” — Because you already know those basics from my earlier blogs. Now we’ll dive deeper π into how supervised learning actually works.
In regression, we have two main concepts Gradient Descent and Cost Function
Gradient Descent is the heart of most machine learning models. Just imagine you're walking down a hill blindfolded — you take small steps and slowly reach the lowest point. That’s what gradient descent does — it keeps updating the model to reach the lowest error (loss).
Cost Function is how we measure the error between predicted and actual output. The most used cost functions in regression are MSE (Mean Squared Error) and MAE (Mean Absolute Error).
In classification, we often use logistic regression, which includes, Sigmoid Function which converts any value into a probability between 0 and 1. Binary Cross-Entropy which is the cost function for classification problems. It tells us how far the predicted probability is from the actual label.
A decision tree splits data into branches based on conditions (if-else style). Random Forest is a collection of many decision trees combined. It uses ensemble learning, but we’ll explore that in the next blog — so we’ll keep it aside for now. Support Vector Machine (SVM) tries to draw a line (or hyperplane) that separates different classes as cleanly as possible.
Bias-Variance Trade-Of is an important concept that helps us understand how well our model is learning. Too much bias leads to underfitting due to wrong assumptions and too much variance leads to overfitting this due to sensitivity to training data. But our goal is to balance them — a good model has low bias and low variance. Sometimes our model learns too well and starts overfitting. To prevent that, we use regularization. Lasso (L1) → Can remove some features by reducing their weight to zero. Ridge (L2) → Reduces the impact of less important features but doesn’t remove them. It’s like giving less priority to unnecessary subjects while studying for a competitive exam.
We use optimization to minimize loss (error) — and we already saw this in gradient descent. Convex functions are smooth curves with one minimum point. This helps gradient descent not get stuck in the wrong place (local minima). So now you can relate everything — from gradient descent to math to model building. See? Math isn't boring when you know what it's doing!
Now we are moving to Unsupervised Learning. As I said earlier, we’re not going to waste time explaining “What is Unsupervised Learning?” again — we’ve already covered it. Let’s dive directly into the main concepts and understand the depth of it.
Imagine you’re grouping people in India based on their food habits. Without knowing their region or name, just based on behavior — like whether they eat rice, chapati, spicy food, etc., you can guess their region — South, North, West. That’s what K-Means does — groups similar data automatically. It keeps updating the centers (called centroids) until the groups are as accurate as possible.
Let’s say you’re organizing your Google Drive. First, you have big folders like “Work” and “College.” Inside “College,” you have folders for each semester, and inside that, you have subjects. That’s hierarchical — a top-to-bottom grouping. There are two approaches: Agglomerative (bottom-up): Start with individual data points and group them. Divisive (top-down): Start with everything in one cluster and break it down.
We’ve already covered dimensionality reduction earlier — and how PCA (Principal Component Analysis) works. You already know this. Math behind PCA? Already covered — you can now go back and relate your linear algebra knowledge here — especially eigenvectors and matrix multiplication. ✅ Just one task for you: Google “PCA formula” and relate what you see with what we’ve learned so far.
Yeah, now we’re almost at the end of this blog. We've learned a lot about ML till now — from supervised to unsupervised learning. But wait, before we move to reinforcement learning, ensemble methods, and neural networks in the next blog, one thing is super important — how do we know our model is working properly? That’s where evaluation metrics come in.
There are mainly two types: Classification metrics and Regression metrics. Classification metrics are used when our model is saying “Yes or No”, “Cat or Dog”, like that — it’s deciding between classes. On the other side, Regression metrics are used when our model is predicting numbers — like salary prediction, temperature, marks, etc.
So in classification, we use:
Accuracy – It tells how many predictions are correct out of all.
Precision – Out of all predicted positives, how many were actually positive?
Recall – Out of all actual positives, how many did we catch correctly?
F1 Score – It’s a balance between precision and recall, like a weighted average.
In regression, we use:
R2 Score – This tells how well our model is fitting. If it’s near 1, it’s great.
RMSE (Root Mean Squared Error) – Just tells the average error, but punishes big mistakes more.
MAE (Mean Absolute Error) – Tells average error in simpler form.
MAPE (Mean Absolute Percentage Error) – Tells the average percentage difference between predicted and actual.
And now comes two cool techniques, Cross-validation – We split our data into multiple parts, train and test in rotation. So we know our model is stable and not just lucky once and Bootstrapping – This is like making many random samples from our data and testing again and again, to see how consistent our model is.
That’s all for this blog. Don’t worry — we are not stopping here. Just stay tuned, the real fun like reinforcement, ensemble and deep neural networks is coming up next!
Click here to Contact me for full materials



Comments
Post a Comment